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MIMO Detection for High-Order QAM Based 
on a Gaussian Tree Approximation 

Jacob Goldberger and Amir Leshem 
Abstract 

This paper proposes a new detection algorithm for MIMO communication systems employing high 
order QAM constellations. The factor graph that corresponds to this problem is very loopy; in fact, it is 
a complete graph. Hence, a straightforward application of the Belief Propagation (BP) algorithm yields 
very poor results. Our algorithm is based on an optimal tree approximation of the Gaussian density of 
the unconstrained linear system. The finite-set constraint is then applied to obtain a loop-free discrete 
distribution. It is shown that even though the approximation is not directly applied to the exact discrete 
distribution, applying the BP algorithm to the loop-free factor graph outperforms current methods in 
terms of both performance and complexity. The improved performance of the proposed algorithm is 
demonstrated on the problem of MIMO detection. 

Index Terms 

Integer Least Squares, High-order QAM, MIMO communication systems, MIMO-OFDM systems. 

I. Introduction 

Finding a linear least squares fit to data is a well-known problem, with applications in almost every 
field of science. When there are no restrictions on the variables, the problem has a closed form solution. 
In many cases, a-priori knowledge on the values of the variables is available. One example is the existence 
of priors, which leads to Bayesian estimators. Another example of great interest in a variety of areas is 
when the variables are constrained to a discrete finite set. This problem has many diverse applications 
such as the decoding of multi-input-multi-output (MIMO) digital communication systems. In contrast to 
the continuous linear least squares problem, this problem is known to be NP hard [8]. 
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We consider a MIMO communication system with n transmit antennas and m receive antennas. The 
tap gain from transmit antenna i to receive antenna j is denoted by Hy. In each use of the MIMO 
channel a vector x = (x±, ...,x n ) T is independently selected from a finite set of points A according to 
the data to be transmitted, so that x € A n . The received vector y is given by: 



y = Hx + e 



(1) 
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The vector e is an additive noise in which the noise components are assumed to be zero mean, statistically 
independent Gaussians with a known variance a 2 1. The mxn matrix H is comprises i.i.d. elements drawn 
from a complex normal distribution of unit variance. The MIMO detection problem consists of finding 
the unknown transmitted vector x given H and y. The task, therefore, boils down to solving a linear 
system in which the unknowns are constrained to a discrete finite set. It is convenient to reformulate 
the complex-valued model into a real valued one. It can be translated into an equivalent double-size 
real-valued representation that is obtained by considering the real and imaginary parts separately: 

Re(H) -Im(H) 
Im(H) Re(H) 

Hence we assume hereafter that H has real values without any loss of generality. The maximum likelihood 
(ML) solution is: 

x = arg min ||Hx — y\\ 2 (2) 

However, going over all the \A\ n vectors is unfeasible when either n or |„4| are large. 

A simple sub-optimal solution, known as the Zero-Forcing algorithm, is based on a linear decision 
that ignores the finite set constraint: 

z = (EfH)- 1 !!^ (3) 

and then, neglecting the correlation between the symbols, finding the closest point in A for each symbol 
independently: 

Xi = arg min \zi — a\ (4) 

aeA 

This scheme performs poorly due to its inability to handle ill-conditioned realizations of the matrix 
H. Somewhat better performance can be obtained by using a minimum mean square error (MMSE) 
Bayesian estimation on the continuous linear system. Let e be the mean symbol energy. We can partially 
incorporate the information that x £ A n by using the prior Gaussian distribution x ~ M(0,el). The 
MMSE estimation becomes: 

2 

E(x\y) = (H T H + — iy^y (5) 
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and then the finite-set solution is obtained by finding the closest lattice point in each component indepen- 
dently. A vast improvement over the linear approaches described above can be achieved by the V-BLAST 
algorithm that is based on sequential decoding with optimal ordering ifTTTl . 

These linear type algorithms can also easily provide probabilistic (soft-decision) estimates for each 
symbol. However, there is still a significant gap between the detection performance of the V-BLAST 
algorithm and the performance of the ML detector. 

Many alternative methods have been proposed to approach the ML detection performance. The sphere 
decoding (SD) algorithm finds the exact ML solution by searching the nearest lattice point. JH, 11251 . 
JH, |[28l . Although the SD reduces computational complexity compared to the exhaustive search of ML 
solution, sphere decoding is not feasible for high-order QAM constellations. While sphere decoding 
has been empirically found to be computationally very fast for small to moderate problem sizes (say, for 
n < 20 for 16-QAM), the sphere decoding complexity would be prohibitive for large n, higher order QAM 
and/or low SNRs [ 14]. Another family of MIMO decoding algorithms is based on semidefinite relaxation 
(e.g. IT3TI . (271, (HI)- Although the theoretical computational complexity of semidefinite relaxation is 
a low degree polynomial, in practice the running time is very high. Thus, there is still a need for low 
complexity detection algorithms that perform well. 

This study attempts to solve the MIMO decoding problem using the Belief Propagation (BP) paradigm. 
It is well-known (see e.g. [26 ]) that a straightforward implementation of the BP algorithm to the MIMO 
detection problem yields very poor results since there are a large number of short cycles in the underlying 
factor graph. In this study we introduce a novel approach to utilize the BP paradigm for MIMO detection. 
The proposed variant of the BP algorithm is both computationally efficient and achieves near optimal 
results. A preliminary version of this paper appears in ifTOTl . The paper proceeds as follows. In Section II 
we discuss previous attempts to apply variants of the BP algorithm to the MIMO decoding problem. The 
proposed algorithm which we dub 'The Gaussian-Tree-Approximation (GTA) Algorithm' is described in 
Section III. Experimental results are presented in Section IV. 

II. The Loopy Belief Propagation Approach 

Given the constrained linear system y = Hx + e, and a uniform prior distribution on x, the posterior 
probability function of the discrete random vector x given y is: 

ocexp(-— ||H:c-y|| 2 ) , x 6 A n (6) 

The notation oc stands for equality up to a normalization constant. Observing that ||Hx— y\\ 2 is a quadratic 
expression, it can be easily verified that p(x\y) is factorized into a product of two- and single-variable 
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potentials: 

p(xi, ..,x n \y) oc JJ^i(iEi) JJVy'C^ij^i) ( 7 ) 

i i<j 

such that 

ipi{xi) = exp(-^y T hiXi) (8) 

ipij(xi,Xj) = exp(--^hJhjXiXj) 

where hj is the i-th column of the matrix H. Since the obtained factors are simply a function of pairs, 
we obtain a Markov Random Field (MRF) representation P4l . In the MIMO application the (known) 
matrix H is randomly selected and therefore, the MRF graph is usually a completely connected graph 
(see an MRF graph illustration in Fig. [lj. 




Fig. 1. The MRF undirected graphical model corresponds to the MIMO detection problem with n = 7. 

The Belief Propagation (BP) algorithm aims to solve inference problems by propagating information 
throughout this MRF via a series of messages sent between neighboring nodes (see 1291 for an excellent 
tutorial on BP). In the sum-product variant of the BP algorithm applied to the MRF (O, the message 
from Xj to xi is: 

mj-n(xi) = y] (ipj(xj)ipij(xi,Xj) ]~[ m k -+j(xj)) , x { e A (9) 

In each iteration messages are passed along all the graph edges in both edge directions. In every 
iteration, an estimate of the posterior marginal distribution ('belief') for each variable can be computed 
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by multiplying together all of the incoming messages from all the other nodes: 

biixi) = ipi(xi)Y[rn k ^i(xi) , x { £ A (10) 

A variant of the sum-product algorithm is the max-product algorithm in which the summation in Eq. 
© is replaced by a maximization over all the symbols in A. In a loop-free MRF graph the sum- 
product algorithm always converges to the exact marginal probabilities (which corresponds in the case 
of MIMO detection to a soft decision probability of each symbol p(xi\y)). In a loop-free MRF graph the 
max-product variant of the BP algorithm always converges to the most likely configuration ll23l (which 
corresponds to ML decoding in our case). For loop-free graphs, BP is essentially a distributed variant 
of dynamic programming. The BP message update equations only involve passing messages between 
neighboring nodes. Computationally, it is thus straightforward to apply the same local message updates 
in graphs with cycles. In most such models, however, this loopy BP algorithm will not compute exact 
marginal distributions; hence, there is almost no theoretical justification for applying the BP algorithm 
(one exception is that, for Gaussian graphs, if BP converges, then the means are correct 11301 ). However, 
the BP algorithm applied to loopy graphs has been found to have outstanding empirical success in many 
applications, e.g., in decoding LDPC codes |9). The performance of BP in this application may be 
attributed to the sparsity of the graphs. The cycles in the graph are long, hence the graph has tree-like 
properties, so that messages are approximately independent and inference may be performed as though 
the graph was loop-free. The BP algorithm has also been used successfully in image processing and 
computer vision (e.g. [7]) where the image is represented by a grid-structured MRF that is based on local 
connections between neighboring nodes. 

However, when the graph is not sparse, and is not based on local grid connections, loopy BP almost 
always fails to converge. Unlike the sparse graphs of LDPC codes, or grid graphs in computer vision 
applications, the MRF graphs of MIMO channels are completely connected graphs and therefore the 
associated detection performance is poor. This has prevented the BP from being an asset for the MIMO 
problem. Fig. |2] shows an example of a BPSK MIMO system based on an 8 x 8 matrix and A = {—1, 1} 
(see Section IV for a detailed description of the simulation set-up). As can be seen in Fig. |2j the BP 
decoder based on the MRF representation |7]) has very poor results. Standard techniques to stabilize the 
BP iterations such as damping the message updates [21] do not help here. Even applying more advanced 
versions of BP (e.g. Generalized BP and Expectation Propagation) to inference problems on complete 
MRF graphs yields poor results |fl9l . The problem here is not in the optimization method but in the cost 
function that needs to be modified to yield a good approximate solution. 
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Fig. 2. Decoding results for 8x8 BPSK real valued system, A = {—1,1}. 



There have been several recent attempts to apply BP to the MIMO detection problem with good results 
(e.g. 031, ll22l . lfl2l . [16]). However, in these methods the factorization of the probability function is 
done in such a way that each factor corresponds to a single linear equation. This leads to a partition of the 
probability function into factors each of which is a function of all the unknown variables. This results in 
an exponential computational complexity when computing the BP messages. Shental et. al [ 26] analyzed 
the case where the matrix H is relatively sparse (and has a grid structure) (see Fig. [3). They showed 
that even under this restricted assumption the BP still does not perform well. As an alternative method 
they proposed the generalized belief propagation (GBP) algorithm that does work well on the sparse 
matrix if the algorithm regions are carefully chosen. There are situations where the sparsity assumption 
makes sense (e.g. 2D intersymbol interference (ISI) channels). However, in the MIMO channel model 
we assume that the channel matrix elements are i.i.d. and Gaussian; hence we cannot assume that the 
channel matrix H is sparse. 
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Fig. 3. The MRF grid model corresponds to 5 x 5 2D intersymbol interference (ISI) channels. 
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In the recent years there were also several attempts to apply BP for densely connected graphs (mainly 
Gaussian graphs) EDI. 151. El. ifBl. ll22l. 

III. The Gaussian Tree Approximation Algorithm 
Our approach is based on an approximation of the exact probability function: 

p{xi,..,x n \y) oc exp(-^ ||Hx - y|| 2 ) , x £ A n (11) 

that enables a successful implementation of the Belief Propagation paradigm. Since the BP algorithm is 
optimal on loop-free factor graphs (trees) a reasonable approach is finding an optimal tree approximation 
of the exact distribution (fTTT) . Chow and Liu proposed a method to find a tree approximation of a 
given distribution that has the minimal Kullback-Leibler (KL) divergence to the true distribution. They 
showed that the optimal tree can be learned efficiently via a maximum spanning tree whose edge weights 
correspond to the mutual information between the two variables corresponding to the edges endpoints. 
The problem is that the Chow-Liu algorithm is based on the two-dimensional marginal distributions. 
However, finding the marginal distribution of the probability function (fTTT) is, unfortunately, NP hard and 
it is (equivalent to) our final target. 

To overcome this obstacle, our approach is based on applying the Chow-Liu algorithm on the distri- 
bution corresponding to the unconstrained linear system. This distribution is Gaussian and therefore it is 
straightforward in this case to compute the two-dimensional marginal distributions. Given the Gaussian 
tree approximation, the next step of our approach is to apply the finite-set constraint and utilize the 
Gaussian tree distribution to form a discrete loop free approximation of p(x\y) which can be efficiently 
globally maximized using the BP algorithm. To motivate this approach we first apply a simplified version 
to derive the zero-forcing decoding algorithm (0]) described in Section I. 

Let z{y) = (H H) _1 H T y be the least-squares estimator (0 and C = <7 2 (H T H) _1 is its variance. It 
can be easily verified that p{x\y) (fTTT t can be written as: 

1 It 

p{x\y) oc f(x; z, C) = -f===^ exp(--(z - x) C^{z - x)) (12) 

where f(x; z, C) is a Gaussian density with mean z and covariance matrix C. f(x; z, C) can be viewed 
as a posterior distribution of x assuming a non-informative prior. Now, instead of marginalizing the true 
distribution p{x\y), which is an NP hard problem, we approximate it by the product of the marginals of 
the Gaussian density f(x; z, C): 

f( X - z,c)*n f (x t ; Zi , a,) = n ^= ex p(- ( %c!f 2 ) (13) 
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At this stage we apply the finite-set constraint. From the Gaussian approximation (TT3T) we can extract a 
discrete approximation: 

p{x\y)^"[\eM- {Zl ~^f ) , xGA n (14) 

i 

Since this joint probability function is obtained as a product of marginal probabilities, we can address 
each variable separately: 

(zi - a) 2 

p(xi = a\y) oc exp( % — ) , a £ A (15) 

Taking the most likely symbol we obtain the sub-optimal Zero-Forcing solution (0]). 

Motivated by the simple product-of-marginals approximation described above, we suggest approximat- 
ing the discrete distribution p(x\y) via a tree-based approximation of the Gaussian distribution f(x; z, C). 
Although the Chow-Liu algorithm was originally stated for discrete distributions, one can verify that it 
also applies for the Gaussian case. For the sake of completeness we give a detailed derivation below. 



A. Finding the optimal Gaussian tree approximation 

We represent an n-node tree graph by the loop-free parent relations {p(z)}™ =1 such that p(i) is the 
parent node of i. To simplify notation we do not separately describe the root node. The parent of the 
root is implicitly assumed to be the empty set. A distribution g(x\, ...,x n ) is described by a tree {p(i)} 
if it can be written as g(x) = Yli=l 9( x i\ x p{i))- We start with a formula for the KL divergence between 
a Gaussian distribution f(x) and a distribution g(x) defined on the same space that is represented by a 
tree graphical model. 

Theorem 1: Let f(x) = f{x\, x n ) be a multivariate Gaussian distribution and let g{x) = Y\a=i 9i x i\ x p(i)) 
be another distribution that is represented by a loop-free graphical model (tree). The KL divergence 
between / and g is: 

n 

D (f\\g) = ^2 D (f( x i\ x p(i))\\9( x i\ x p(t))) (16) 
i=i 

n 

-h(x) + ^2(h(xi) - I{xi;x p{i) )) 

i=l 

such that / is the mutual information and h is the differential entropy, based on the distribution f(x). 
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Proof: The definition of the KL divergence implies: 

D(f\\g) = //log/- J f{x)Y,\ogg{ Xi \x p{i) ) 



- -h(x)-^2 / f( x ii x p(i)) l °g9(xi\x p (i)) 
i=i J 

n ,. 

f( X i> X p(i)) l °&f( X i\ X p(i)) 

i=l J 

n 

-h(x) + s ^ j D{f{x i \x p{i) )\\g{x i \x p{i) )) 



i=l 



i=l 

From Eq. (fl6l ) it can be easily seen that if we fix a tree graph {p(i)}™ =1 , the tree distribution whose KL 
divergence to f(x) is minimal is g(x) = YVi=i f( x i\ x p(i))- m other words, the best tree approximation 
of f(x) is constructed from the conditional distributions of f(x). For that tree approximation we have: 

n n n 

D U\\ Wf{ x i\ x p(€))) = -h{x) + ^h(xi) -^2l(xi;x p{i) ) (17) 

i=l i=l i=l 

Moreover, since in Eq. (fTTT) h(x) and h(xt) do not depend on the tree structure, the tree topology {p(i)} 
that best approximates f(x) is the one that maximizes the sum: 

Tl 

^ . I( x i) x p{i) ) (18) 

i=i 

A spanning tree of a graph is a subgraph that contains all the vertices and is a tree. Now suppose the 
edges of the graph have weights. The weight of a spanning tree is simply the sum of weights of its edges. 
Eq. (fT8l ) reveals that the problem of finding the best tree approximation of the Gaussian distribution f(x) 
can be reduced to the well-known problem of finding the maximum spanning tree of the weighted n-node 
graph where the weight of the i-j edge is the mutual information between Xi and Xj 0. It can be easily 
verified that the mutual information between two r.v. x,- L and xj that are jointly Gaussian is: 

I(x i -x J ) = -\og{l-p%) (19) 

where pij is the correlation coefficient between Xi and Xj. 

There are several algorithms to find a minimum spanning tree. They all utilize a greedy approach. In 
this work we use the Prim algorithm [24] which is efficient and very simple to implement. The Prim 
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algorithm begins with some vertex v in a given graph, defining the initial set of vertices T. Then, in 
each iteration, we choose a minimum-weight edge (u, v), connecting a vertex v in the set T to the vertex 
u outside of set T. Then vertex u is brought in to T. This process is repeated until a spanning tree 
is formed. We can use a heap to remember, for each vertex, the smallest edge connecting the current 
sub-tree T with that vertex. The complexity of the Prim's algorithm, that finds the minimum spanning 
tree of an n- vertex graph is 0(n 2 ). 

We note in passing that since the Prim algorithm is based on a greedy approach, it only relies on the 
order of the weights and not on their exact values. Hence, applying a monotonically increasing function 
on the graph weights does not change the topology of the optimal tree. To find the optimal Gaussian 
tree approximation we can, therefore, use the weights p\- instead of I(xf,Xj) = — log(l — pfj). The 
optimal Gaussian tree is, therefore, the one that maximizes the sum of the square correlation coefficients 
between adjacent nodes. To summarize, the algorithm that finds the best Gaussian tree approximation is 
as follows. Define x\ as the root. Then find the edge connecting a vertex in the tree to the vertex outside 
the tree, such that the corresponding square correlation coefficient is maximal and add the edge to the 
tree. Continue this procedure until a spanning tree is obtained. 

B. Applying BP on the tree approximation 

Let f(x) be the optimal Chow-Liu tree approximation of f(x; z, C) (fl2l ). We can assume, without loss 

of generality, that f(x) is rooted at x\. f(x) is a loop-free Gaussian distribution on xi, ...,x n , i.e. 

n 

f(x) = f(x 1 ;z,C)Hf(x i \x p(i y,z,C) , x 6 R n (20) 
i=2 

where p(i) is the 'parent' of the i-th node in the optimal tree. The Chow-Liu algorithm guarantees 
that f(x) is the optimal Gaussian tree approximation of f(x; z, C) in the sense that the KL divergence 
D(f\\f) is minimal. 

Given the Gaussian tree approximation, the next step of our approach is to apply the finite-set constraint 
to form a discrete loop free approximation of p{x\y) which can be efficiently globally maximized using 
the BP algorithm. Our approximation approach is, therefore, based on replacing the true distribution 
p{x\y) with the following approximation: 

n 

p(xi,...,x n \y) <x f(x) = f(xr,z,C) Y[f(xi\x p( iy,z,C) , x G A n (21) 

i=2 

The probability function p{x\y) is a loop free factor graph. Hence the BP algorithm can be applied 
to find its most likely configuration. We next derive the messages of the BP algorithm that is applied 
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Input: A constrained linear LS problem: Hx + e = y, a noise level a 2 and a finite symbol set A 
whose the mean symbol energy is denoted by e. 

Algorithm: 

. Compute z = (H H + ^I)~ l ¥Cy and C = <r 2 (H T H + ^I)' 1 . 
• Denote: 

ti n\ ( ^ ( Xi ~ Zi ^ 2 ^ 
f{Xi;z,C) = exp(-~ - ) 



/(xjlxjjz, C) = exp(-- 



1 ~~ Zi) — Cij /Cjj(xj — Zj)) 2 

2 Cu - Cfj/Cjj 



Compute maximum spanning tree of the n-node graph where the weight of the i-j edge is 

the square of the correlation coefficient: pfj = Cfj/(CnCjj) 

Assume the tree is rooted at node '1' and denote the parent of node i by p(i). 

Apply BP on the loop free distribution: 

n 

p(x 1 ,...,x n \y) oc f(x 1 ;z,C)Y[f{xi\x p{i y,z,C) x 1 ,...,x n G A 

i=2 

to find the (approx. to the) most likely configuration. 



Fig. 4. The Gaussian Tree Approximation (GTA) Algorithm. 



on p(x\y). An optimal BP schedule, when applied to a tree, requires passing a message once in each 
direction of each edge ifTTl . The BP messages are first sent from leaf variables 'downward' to the root. 
The computation begins at the leaves of the graph. Each leaf variable node sends a message to its parent. 
Each vertex waits for messages from all of its children before computing the message to be sent to its 
parent. The 'downward' BP message from a variable x- L to its parent variable x p ^ is computed based on 
all the messages Xi received from its children: 

mi^p(i){x p (i)) = ^2 f(xi\x p (i);z,C) Yl rnj^i{xi) (22) 

XiEA j\'P{j)=i 

If Xi is a leaf node in the tree then the message is simply: 

mi^ p (i)(x p (i)) = ^2 f{xi\x p (iy,z,C) (23) 

The 'downward' computation terminates at the root node. 

Next, BP messages are sent 'upward' back to the leaves. The computation begins at the root of the 
graph. Each vertex waits for a message from its parent before computing the messages to be sent to each 
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of its children. The 'upward' BP message from a parent variable x p ij\ to its child variable X{ is computed 
based on the 'upward' message x p ^ received from its parent x p ( p (;)) and from 'downward' messages 
that x p u\ received from all the siblings of xf. 

™ P (i)-*i(Xi) = 22 f( X i\ X p(i)> Z > C ) m p(p(i))->p(i)( X p(i)) X ( 24 ) 

W m j^rp(i){ x p(i)) i Xi a A. 

{j\j¥=i,PU)=P( i )} 

If x p u\ is the root of the tree then the message is simply: 

m p( i)_n{xi)= ^2 f( x ii x p(iYi z i C ) x ( 25 ) 

\\ m j-¥p(i)( x p(i)) ) X{ £ A 

{i\j¥=i>p{j)=p{i)} 

After the downward-upward message passing procedure is completed we can compute the 'belief at 
each variable which is the product of all the messages sent to the variable from its parent and from its 
children (if there are any). 

beliefj(xi) = m p ^i(xi) f\ ™<j->i( x i) > Xi £ A (26) 

j\p{j)=i 

In the case X{ is the root node, the 'belief computed is as follows: 

belief (xi) = / (xi ; z, C) \\ ) , Xi £ A. (27) 

j\p(j)=i 

Since the approximated distribution p(x\y) (|2TI) is loop free, the general Belief Propagation theory 
guarantees that (the normalized) belief vector is exactly the marginal distribution p(xi\y) of the approxi- 
mated distribution p(x\y) (|2"T1 ). To obtain a hard-decision decoding we choose the symbol whose posterior 
probability is maximal: 

Xi = argmax belief (a) , a £ A (28) 

a 

Above we described the sum-product version of the BP algorithm that computes the marginal prob- 
abilities p(xi\y). A variant of the sum-product algorithm is the max-product algorithm in which the 
summation in Eq. (l22l-(|25T) is replaced by a maximization over all the symbols in A. The max-product 
algorithm finds the most likely pattern of the approximation p(x\y). We did not observe any significant 
performance difference using either the sum-product or the max-product variants for decoding MIMO 
systems presented in the experiment section. The max-product is more computationally efficient since 
the BP messages can be entirely computed in the log-domain. 
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C. An MMSE version of a tree approximation 

The MMSE Bayesian approach (O is known to be better than the zero-forcing solution ©. In 
MMSE we partially incorporate the information that x G A n by using the prior Gaussian distribution 
x ~ J\f(0,el). In a similar way we can consider a Bayesian version of the proposed Gaussian tree 
approximation. We can partially incorporate the information that x G A n by using the prior Gaussian 
distribution x ~ A/"(0, el) such that e = pn ^2 a£ ^ a 2 . This yields the posterior Gaussian distribution: 

Mxly)= 7mwm x (29) 

exp(-i(x - E(x\y)) T (H T H + —I){x - E{x\y)) 
2 e 

such that E(x\y) = (H T H + ^J) -1 H T j/ and V(x\y) = (H T H + ^J) _1 . The MMSE method is obtained 

by approximating the unconstrained posterior distribution f( x \ y )(x\y) by a product of marginals. In our 

approach we use, instead, the best loop-free Gaussian approximation. We can apply the Chow-Liu tree 

approximation on the Gaussian distribution d29l ) to obtain a 'Bayesian' Gaussian tree approximation for 

p(x\y). This way we partially use the finite set constraint when we search for the best tree approximation 

of the true discrete distribution p(x\y). This is likely to yield a better approximation of the discrete 

distribution p(x\y) than the tree distribution which is based on the unconstrained distribution f(x; z, C). 

To summarize, our solution to the MIMO decoding problem is based on applying BP on a discrete 

version of the Gaussian tree approximation of the Bayesian version of the continuous least-square solution. 

We dub this method "The Gaussian-Tree-Approximation (GTA) Algorithm". The GTA algorithm is 

summarized in Fig. IIII-BI We next compute the complexity of the GTA algorithm. The complexity of 

computing the covariance matrix (H H + ^-I)^ 1 is 0(n 3 ), the complexity of the Chow-Liu algorithm 

(based on Prim's algorithm for finding the minimum spanning tree) is 0(n 2 ) and the complexity of the 

BP algorithm is 0(\A\ 2 n). 

IV. Experimental Results 

In this section we provide simulation results for the GTA algorithm over various MIMO systems. The 
channel matrix comprised i.i.d. elements drawn from a zero-mean complex normal distribution of unit 
variance. We used 500, 000 realizations of the channel matrix and each matrix was used once for sending 
a message. The performance of the proposed algorithm is shown as a function of the variance of the 
additive noise a 2 . The signal-to-noise ratio (SNR) is defined as 10 log 10 (^ s /A r o) where E s /Nq = 2f (n 
is the number of variables, a 2 is the variance of the Gaussian additive noise, and e is the mean symbol 
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Fig. 5. Comparison of various detectors in 12 x 12 system, 16-QAM symbols. 

energy). We compared the performance of the GTA method to the V-BLAST (MMSE-SIC) algorithm 
with optimal ordering for the successive interference cancelation [ 1 1], and to the Schnorr-Euchner variant 
of sphere decoding (SE-SD) with infinite radius JH, |25l . We used sorting of the channel matrix using the 
SQRD algorithm [33 ] and regularization |[32l , which substantially reduces the computational complexity. 

We also implemented the SDR detector suggested by Sidiropoulos and Luo ll27l . Recently it was 
shown HH that for the cases of 16-QAM and 64-QAM this SDR based method is equivalent to the 
SDR based detection method suggested by Weisel, Eldar and Shamai [31 ]. The SDRs were solved using 
the CSDP package [4[. In the SDR Gaussian randomization step, 100 independent randomizations were 
implemented. All the MIMO detection algorithms were implemented in C for efficiency. 

Fig. [5] shows MIMO detection performance for a 12 x 12 MIMO system using 16-QAM. The methods 
that are shown are V-BLAST, GTA, SDR and sphere-decoding. In this case the SDR outperforms both 
the V-BLAST and the GTA methods. However, in this case it is still feasible to compute the optimal 
maximum-likelihood algorithm using the sphere decoding algorithm. 

The SD-SE is the favorite detection method when the problem size is small or moderate. In this case the 
SD-SE can always yield the exact ML solution at acceptable computational cost. In large size problems, 
however, there is still a need for good approximation methods. Fig. [6] shows the SER versus SNR and 
worst case execution time versus SNR for the 12 x 12 system using 64-QAM. Fig. [7] shows the same 
experimental results for the 16 x 16 system using 64-QAM. To assess the computational complexity we 
used a measure of the worst case rather than the average case since in online applications we have to 
decode within a specified time. The choice between execution time or number of floating point operations 
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is debatable. The differences in running time between the methods we implemented was in orders of 
magnitude and running time is easier to appreciate. 
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Fig. 6. 12 x 12 system, 64-QAM symbols, (a) SER versus SNR, (b) max seconds for decoded symbol vector versus SNR. 
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Fig. 7. 16 x 16 system, 64-QAM symbols, (a) SER versus SNR, (b) max seconds for decoded symbol vector versus SNR. 

As can be seen from Figs. [6] and [71 the performance of the GTA algorithm in high SNR is significantly 
better than the V-BLAST. The computational complexity of GTA is comparable to V-BLAST and it is 
two orders of magnitude better than SDR. We note in passing that the performance of the SDR method 
|27l in these 64-QAM cases is worse than that of MMSE-SIC and the computational complexity is much 
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higher. From the derivation of the SDR method it can be seen that the relaxation becomes more crude 
as at higher constellations. 

We next show the performance of several variants of the GTA algorithm. The GTA algorithm differs 
from the ZF, MMSE and MMSE-SIC algorithms in several ways. The first is a Markovian approximation 
of f(x;z,C) instead of an approximation based on a product of independent densities. The second 
aspect is the use of an optimal tree. To clarify the contribution of each component we modified the GTA 
algorithm by replacing the Chow-Liu optimal tree by the tree 1 — >■ 2 — > 3, — >■ n. We call this method 
the 'Line-Tree'. As can be seen from Fig. [8j using the optimal tree is crucial to obtain improved results. 
Fig. [8] also shows the results of the non-Bayesian variant of the GTA algorithm. As can be seen, the 
Bayesian version yields better results. Fig. [8]shows the symbol error rate (SER) versus SNR for a 20x20, 
\A\ =4, real MIMO system. The performance of the GTA method and its variants was compared to the 
MMSE and the MMSE-SIC algorithms. 
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Fig. 8. Comparative results of MMSE, MMSE-SIC and variants of the GTA for 20 x 20 real system ,A = {±1, ±3}. 

V. Conclusion 

We proposed a novel MIMO detection technique based on the principle of a tree approximation of the 
Gaussian distribution that corresponds to the continuous linear problem. The proposed method outper- 
forms previously suggested MIMO decoding algorithms in high order QAM constellation, as demonstrated 
in simulations. Although the proposed method yields improved results, the tree approximation we applied 
may not be the best one (finding the best tree for the integer constrained linear problem is NP hard). It is 
left for future research to search for a better discrete tree approximation for the constrained linear least 
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squares problem. This paper dealt with a tree approximation approach, more complicated approximations 
such as multi-parent trees could improve performance and can potentially provide a smooth performance- 
complexity trade-off. 

While the method provides excellent performance, it is worthwhile mentioning that the method provides 
a-posteriori probabilities for each variable that can be used to improve performance. This is done by 
applying a Schnorr-Euchner sphere decoding algorithm |[25l . where we order the symbols according 
to their a-posteriori probabilities. This can lead to very close to optimal performance in a significantly 
reduced complexity. It is because that by computing the a-posteriori probabilities, we have a much higher 
probability of finding the true solution during the first search, therefore significantly reducing the search 
radius. 

There are several important applications of the proposed technique. We comment here on combining 
it into communication systems with coding and interleaving. It is useful both for single carrier and 
OFDM systems. It can serve as a MIMO decoder for wireless communication systems. Using the a- 
posteriori probability distribution of the symbols we can easily compute the a-posteriori probability and 
the likelihood ratio for the bits. The technique can be combined with MIMO-OFDM system with bit 
interleaved coded modulation or with trellis coded modulation by joint coding of over all frequency tones 
of the OFDM system, and running the decoder for each tone independently. 

In this paper we focused on the MIMO detection problem. The proposed method, however, can be 
applied to solve constrained linear least squares problems which is an important issue in many fields. 
A main concept in the GTA model is the interplay between discrete and Gaussian models. Such hybrid 
ideas can be considered also for discrete inference problems other than least-squares. 
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